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ABSTRACT 

The investigation examined relationships among scales 
for observing and rat ihg teacher performance. Beginning teachers with 
varying levels of professional experiisnce (2> 9> and 16 months) were 
rated by pairs of observers bh two occasions . Ihtercbrrelatiphs 
across occasions fell between. 5 and . 8 . _ Interrater_agreement ranged 
between^ .5 and .9. Factor analyses revealed about 67 percent common 
variance among the scales. Two rotated factors characterized *'dir<pct 
instruction" and "classroom control" dimensions. The extent of 
unidiinensional variance is discussed in relation to underlying "true" 
versus "attributional" (halo ef f ects ) \sources of common variance. 
(Author) . 
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Observational Ratings of Teaching Performance: 
Dimensionality and Stability 

the investigation vexamined relationships among scales for observing and 
rating teacher performance. Beginning teachers with varying levels of pro- 
fessional experience [2, 2, and 15 months) were rated by pairs of observers 
on two occasions; Ihtercorrelations across occasions fell between .5 and .8. 
Interrater agreement ranged Between ;5 and .9. Factor analyses revealed " • 
about 67% common variance among the scales; Two rotated factors character- 
ized "direct instruction" and "classroom control" dimensions; The extent of 
unidimehsidrial variance is discussed in relation to underlying "true" versus 
"attributional" (halo effects) sources of common variance. 
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INTRODUCTION 



The investigation concerns consistency among observers* ratings of ; 
teaching performance. Three forms of consistency are at issue: (i)^ 
cross-rater agreement — do persons who simultaneously observe teachers 
and pupils agree with one another? (2) cross-occasion stabil ity--are 
ratings of the same teacher across occasions similar? (3) dimensional 
consi stency— are different aspects of teaching performance rated 
similarly? Interrater agreement, stabil ity, and dimisnsioaal ity^ are; 
elements that are integral for analyses of the general izability for any 
set of observations (Shavelsan and Dempsey-Atwoodi 1976; Shav^lson and 
Webb, 1981). 

Speci f ical ly i the report describes a study of rel ationships among 
16 scales for observing and rating teaching perfojrm.tnce. The rating 
scales comprise the Teacher and Pupil Performance Ratings (TePPR),'a new 
instrument for assessing teaching performance, including aspects of 
pupil behavior and classroom environment that reflect teaching effec- 
tiveness. (Nelsen, Ray, Knight, and Brook, note 1). This report pre- 
sents da '•.a concerning interrater agreement among the observers and 
concerning the stability of ratings on each scale, as the same teachers 
were rated on two occasions^ Tlie report also describes the extent to 
which the 16_ scales intercbrrel ated with one another, that is, the 
proportion . of variance among the scales that was common, and the 
factorial structure of the scales. 



Backgrcrund 

_ Two decades ago, in the f i rst HaadbaQfe._Qf Resear^ JeaclilJig , Medley 
& Mitzel (L963) declared that rating approaches had proven ''uniformly 
unsuccessful in yielding measures of teaching skill." A major source of 
unreliability and invalidity of ratings, the authors noted, was contam- 
ination of measures by halo effects, i.e., the influence pf raters' 
general impressions upon their specific judgments across, items on the 
instrument. They further pointed out that halo effects spuriously 
inflate (a) coefficients of observer agreement, (b) stability coeffi- 
cients, and (e) internal consistency among items on a scale. 
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A mdre tempered appraisal j)f the utility of observatibnal ratings 
was presented by Rosenshine S Ftirst (1973) in the Second Handbdak-jaf 
Research on Teaching . Based upon earlier reviews of studies in which 
both rating and category systems were used to predict student achieve- 
ment, (Rosenshiijie S Furst,^ 1971; Rosenshine, 1971) they concluded that 
the most significant results had been obtained using rating scales, 
although certainly not all rating scales predicted student learning. An 
advantage of rating scales, they noted^ is the possibility for the 
observer to process many cues before making a decision^ : A disadvantage, 
oh the ptherhand, is that specific details about the sequence, context, 
arid forms of teacher behavjor are typically not provided by rating 
methods. , : ; 

Rating scales, and other measurement procedures that rely upon 
perceptions and attributions by observers, yield data that are. contam- 
inated by observer errors. Such errors jnclude halo effects and other 
expectancy effects, di fferential interpretations of key terms, and 
judgments that vary because different standards of comparison are. 
employed by different raters. (Cooper^ 1981 jFiske, 1978). Measures 
most vulnerable to such observer errors are those iji which key terms and 
instructions are ill-defined and vague. For example,* many teacher 
rating scales elicit judgments about general characteristics, such as 
warmth, enthusiasm, or sease of humor ^ while failing to specify the^ 
refertnt behaviors upon which the observer shoul d focus. Also, rating 
scales often elicit judgments about characteristics without specifying 
the situational context, temporal bj^iiidaries j or other essential facets 
that might focus the observers' attention upon spiecific events (Fiske, 
1978). 

- - - '- t _ , _ ■ 

Critics of ratings scales and attributi onal measures advocate 

observational procedures which focus upon specific, narrowly defined 
acts that can 'be reliably coded (Medley & Mitzel, 1963; Fiske, 1978). 
The development and use of such procedures, which have been charac- 
terized as "low inference" measures (Rosenshine S Furst, 1973) , .have 
uridoubtedly contributed to the description arid analysis of teaching and 
learnirig processes (cf. Good & Brophy, 1978). Evidence concerning 
particular teacher and pupil behaviors that are indicators of instruc- 
tidrial effectiveness has been accumulating, bjit, to date, no set of 
speci f ic behavi oral indices has emerged as sufficiently basic^ cbmpre- 
hensive^ or coriserisually accepted that it could serve as an indicator of 
competency or general teaching performarice (Rosenshine S Furst, 1973). 

If low inference measures cannot satisfy the need for . economi cal ; 
arid cdmp'reherisive performance appraisal^, iind if rating procedures 
cdritiriue to be used despite their unreliability, then evaluators should 
conceritrate upon improvement of rating instruments and reduction of 
observer errors. 

A variety of methods has been employed to reduce halo and increase 
the accuracy of ratings. (Cooper, 1981). Cooper *s review of these 
studies suggested that four methods were most promising as means of 
reducing il lusory halo: increasing rater-ratee famil iarity , using 
multiple raters, rating from current exposure, and obtaining ratings of 
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central i rrelevant categories. Cooper also noted the need for more 
basic research on how perceptual processes affect rater error. 

Meanwhile, the demand for comprehensive indicate 6f teaching^ 
performance and competency continues to grow, as policies and procedures 
are being devel oped for certi ficati on of competency ; tenure decisions^ 
and merit pay. Despite their flaws , rating procedures have continued to 
serve for these functions and new sea have continued to be developed. 
For example, the states of > Georgia and South Carolina have invested 
substantial sums of. money developing instruments and procedures to 
certify beginning teachers (Capie, W. , Johnson, C. F., Anderson^ S. J.* 
El let , C. D. , & Okey, J. R. , note 2; Stulac II, J. F., Gettone* V. G. * 
and others , note. 3) . 

The Teacher Performance Assessment Instruments (TPAI ;^ et al , 

note 2) and the Assessments pf_ Performance In Teaching (^Stulac et al 
note 3) , were devel oped to assess mi nimum prof i ci ency of begi nni ng 
teachers. These instruments have incorporated improved methods for 
observi ng arid judgi ng oerfqrntance. Observer training programs have been 
establ ished, and the conditions for^ observing teachers have Seen struc-. 
tured and standardized. Ratings are obtained on several occasions by at 
least two raters, so data can be analyzed-^to determine the extent to 
which the ratings are general i zable across. occasions and raters (CapiCi 
note 4). However, the instruments were designed for a specific purpose^ 
i.e., to elicit discrete judgments concerning the presence dp absence of 
certain minimum proficiences , rather than to measure a broader range of 
differences in performance levels. The characteristics to be assessed 
by the instruments were determined by surveys of teachers* and other 
professional s* opinions concerniji^ '^essential competencies" ^ rather than 
on the aasis of systematic theory or research on characteristics of 
effective teachers. Furthem^^ the large nunber of charac- 

teristics encompassed by these instruments , the time and costs for 
observing each teacher are substantial. 

A review of teache^r observation i nstruments reported in Simon & 
Boyer^l97dJ and Bortch a {1_977J did not yield examples of 

teacher rating instruments ^t^^^^ brief, but compre- 

hensive observational ratings of teacher performance. That is, there 
appeared to be no instrunent that (aj focused upon aspects of teaching 
performance and pupil beh_ayi or that had been shown by research to be 
relatei to teaching effecti \/eness , (b) specified aspects of performance 
that represented unsatisfactory, satisfactory * and excellent perform- 
a.ice, (c) was sufficiently concise to broadly assess teaching perform- 
ance in an hour or less, (dj and was , a^t the same time, su 
comprehensive to yield an overall assessment of teaching performance. 

The Teacher and Pupil Perfomance Ratings (Te js a new 
instrument developed to assess performance of beginning teachers In 
c i assrooms The TePPR was desi gned to provi de a comprehensi ve but bri ef 
appraisal of a teacher^sperfonpanee in the classroom, including cogni- 
tive, fffecti^ve, and interactional aspects of teaching. The TePPR also 
assesses aspects of pupi 1 behavi or and the c1 assroom envi rpnment that 
prestimably rebate to instructional effectiveness. Certain of the, -per- 
formarice dimensions, i.e., clarity of presentation, pupil engagement. 



and range of interaction, were derived frm studies of characteristics 
associated wi th instructional effectiveness (cf; Rosenshine & Furst^. 
1973; Gtood & Brophy, 1978; Mori iave, note 5). Other aspects of perform- 
ance, e.g. , physical organization of the classroom and demonstration of 
personal regard, vvere included to study their potential validity as 
performance indicators. 

the scales were designed to differentiate, between levels of per- 
formance, ranaing from poor or unsatisfactory to excellent ^ as well a's 
to discriminate between adequate and i nadequate performance. The . 
primary purpose for dev_el_oping_ the TePPR was to provide descriptive data 
to account for on-the-job performance of graduates from teacher educa-„ 
tion programs at Arizona St a In its current form, and 
until predictive vaj idity studies have been compl eted , it is recommended 
that the instrument be used only for such descriptive or research 
purposes, and not as part of an assessment tool for decisions about 
individual teachers. 

As part of the development of the TePPR, data on performance levels 
of teachers with different levels of experience were g.athered as 
evidence of construct validity. Also, data concerning interrater 
agreement^ stabiljty of ratings, and intercbrrelations among the scales 
were obtained. These data provide basic evidence concerning the reli- 
ability or general izabii ity of the observations. This report presents 
these data. __It a] so presents analyses of the factorial structure and of 
whe extent of unidimensional i ty (or halo) that is manifested in the 
ratings. 



Method 



3amp^T& 

Recent graduates from teacher education programll at Ari zona State 
University (ASU) comprised the target population. The study included 
beginning elementary and secondary teachers who had been ernployed in_. 
saven public school districts _wi_thi_n a proximity of about 20 miles of 
the campus. The schools in which these, teachers taught vairied widely 
with respect to demographic characteristics of students. They included 
suburban , inner Qity, and semi -rural communities ^ and lower and middle 
i ncome neighborhoods. All recent graduates who were employed as 
teachers in these districts were asked to al 1 ow observers to schedule 
two visits to their classes. All. but three teachers agreed. 



The sample included three groups of graduates ^ each with succes- 
sively greater levels of professional experience, as follows: 

_ Group A consisted of 14 beginning teachers with only one to two 
months ot professional teaching experience. The grade levels they 
taught ranged from kindergarten to Uth grade. 

^roup— g consisted of 35 teachers with five to eight months of 

experience. Their grade levels also ranged from kindergarten to 11th 



grade, including some ungraded classes such as home economics^ music^ 
and physical education. 

aroup-C included 14 second year teachers who were observed between 
their I4th and 18th month of teaching. Their grade levels ranged frofn 
kindergarten through 6th grade. 




^ "Fhe qDservers were facul ty members and graduate assistants from the 

College of Education. Their backgrounds were heterdganediiSi but all 
were fami liar with publ ;cscho and procedures, and most had 

teaching experience^. Fifteen observers participated in a fbur-hbur 
orientation and training program prior to the Spring, 1982 studies. 
Subsequent rel iabil ity checks revealed that six cf the eight pairs 
demonstrated agreement greater than .50 (product moment correlations) on 
at Jeast. 13 of the 16 scales.- Two rater pairs revealed substantially 
pdoVer agreement i and their observations were excluded from the Group B 
data base. 

Four experienced dbservers provided dn-the-job training to four 
novice observers for the Group C observatidhs. The interrater agreement 
levels for all the pairs exceeded the Criterion of .50 for 13 of the 16 
scales. 



The Teacher and Pupil Ratings (TePPR) Scales 

The TePPR consists, df sixteen scales, twelve which describe teacher 
behaviors or aspects of performance inferred from behaviiir; one which 
characterizes the physical aspects of the classroom environment; two 
which represent pupil behavior; and one which cdnslsts _df ah overall 
judgment of teachingL perfoirmances (Nelseh et al.^ ndte 1; see appendix 
for copy of the instrument). The ratings level for each scale range 
from (IJ, representing^ ''^p^^^ (5), representing "excellent"; (3) 

represents "adeguate"perfpnnance. ' Descriptive adjectives define these 
varying levels for each scale. 

^ The instructions stipulate that observation periods last 45 - 60 

minutes, although experienced observers can complete the task in as 
little as in 3d minutes under optimal conditions. Instructions also 
state that ratings should be based only on current performance during 
the session , I.e. , excluding recol lections from previous dbservatidps or 
other persons' reports about the teacher. Observers are also instpuc- 
ted to si gnify_ ''no basis for judgment" If classroom activities didjnot 
provide a sufficient basis to observe behavior and.fdrm a judgment 'i)n a 
particul ar scale. 

Thus, the TePPR employed the following procedures to redu 
observer error: using multiple raters, rating from current, exposure, 
rater traini ng, and behavioral 1y speci^fic rati ng seal es. /These design 
features were based primarily upon Fi ske' s sugges^t^d strategies 

for personal ity. assessment. They also correspond with/the strategies 



for reducing halo suggested by Cooper (1981) ^ although developmeht of 
the TePPR (Nelsen, et al , note 1) preceded our discovery of the Cooper 
article. - 



Procedures 



Bach teacher was observed simultaneously by the same pair of 

observers on each of two occasions. Each dbservation session lasted 30 
to 60 mjnutes. The observations were schedul ed within three to five 
weeks of_ one another. Principals arid teachers were asked to participate 
in the project by letter. Visits were scheduled in advance via phdrie 
calls. Confidential ity of the ratings was assured^ in that teachers 
were told thjit no one pther than project staff could see the ratings. 
Teachers themselves were riot shown their own ratings. 

Raters were instruct^ to compare their ratings follbwing each 

session. Under no circumstances, however, were ratings to be changed on 
the basis of these cross-checks. 

The three groups were constituted of beginning teachers with 
varying levels of experience. Group teachers wi_th one to two months 
of experience^ were observed in Fall , 1982. Group B, with five to eight 
months of experience^ and Group C^ with fourteen to eighjfeen months of 
experience were observed in Spring^ 1982. Eight of the/21 teachers in 
Group C had been observed previously , one year .earl ier^ by different 
observers, employing an earlier version of the instrument. 



Resul ts 



One basis for evaluating the reliability of the observations is 
provided by data on interrater agreement. Intercorrelatioris between the 
ratings based upon simultaneous pbservatidri are_ presented - in Table 1, 
separately for the first and second occasion. For each scale arid each 
occasion, a set of three figures is | presented ^ represeritirig the agree- 
ment coefficient for each of the three groups with differing experience 
levels. For occasiori U most of thi coefficients fell between .5 and 
.9. For occasion 2^ most fel 1 between .5 arid 1.0. The median value for 
the two occasions were .68 and .76^ respectively. 

On 13 of the 16 seal es the. agreement coefficierits were at least .50 
or greater for at least five of the six rel iabil ity studies (within the 
three experience level groups on the two occasions). The reliability 
coefficients were slightly below this standard for Scale E^ Sensitiyit^y 
to Pupil Comprehension; Scale Range of Teacher Iriteractiori; arid Scale 
L, Classroom Management. Two scales revealed agreement cdefficierits 
greater than .70 for all groups on both occasions :_ Seal e F^ Adaptation 
to Individual Differences, and Scale J, Pupil Self Control and 
Responsibi 1 ity. 



Stability of RatingB^ 



. A second purpose of the study was to determine the stability of 
ratings, across occasions. Bata describing the stabjl ity provide another 
basis, for assessing the reliability of the ratings. eorrelations 
between the ratings on the two occasions are included in Table 1, but 
for ease of comparison, they are also presented separately in Table 2. 
nost of the stability coefficients were betv^een .5 and .8. Indeed_^for 
each scale, the stability coefficients for at' least two of the three 
experience groups were .5 or greater, with the slight exception of Scale 
K, for which the coefficients were .71, .48, and AS. The three stabil- 
ity coefficients were quite consistent across the experience level 
groups for certajn scales (A; H, and I). However, they varied consid- 
erably fjDr other scales, especially scales B, G, K, and N. This 
variability would seem to be_ attributable 1_n large part to sampling 
error, i.e.^ as a result of the small size of the samples^ especially of 
Groups A and C. Therefore, it wo^u^ be unwise to infer trends, 

concerning differences in the stabil ity coefficients. Ihdeedj there did 
not seem to be any overall tendency^ for _t stability cqefficients to be 
consistently higher or lower for the more versus less experienced 
teachers. \ 



dimensional ity^ 

Another primary is slie in the investigation concerned the dirnen- 
sional ity of the ratings^ Data concerning the dimensional ity among the 
scales were provided with factor analyses of the ratings. Data for 
Group 8 only were analyzed, since the Ns for groups A and C were tod 
small to yield stable factors. Usjn^ the Statistical Programs, for 
Social Sciences ] SPSS) principal cojnponents analysi s program, data were 
analyzed separately for occasion 1 and 2:; 

The extent of uni^di^mensional ity among the ratings on ajl scales is 
reflected in the percent of variance explained by the first principal 
component. Percentages of variance accounted i^n the intercprrel afion 
matrices of the first and second occasions were 66.6 and 68.(3, 
respectively. 

There is also evidence that an additional . basi^c dimension may be 
differentiated within the matrices^ reflected in the loadings on the 
se^eond principal component. Employing the criterion of accepting all 
principal comp^)nehts with eigenval ues greater than i.O, the first two 
components were retained for both occasion 1 and Fo1 1 owi ng Kai ser* s 
(1958) varimax procedure, these components werAAubject^ to orthogonal 
rotation. The results of these analyses are presented in Table 3. 

The factors for both occasions are similar. For both occasions. 
Factor ! includes loadings from ail scales except Seal es d. , Pupil Se[f 
Gontral and M. , Glassroom Control. Althbugh all other scales load on 
this Factor, among the high loadings that ;define the factor are: . C. , 
Presentation of Subject Matter; E., Sensitivity to Pupil Gomprehension; 
Gi , Quality of Feedback; K. , Range of Teac^her interaction; t; , Classroom 

I 
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Mariagemerit; aad , Quality of Planning, as well as P., Overall Judgment 
of leaehirig Effectiveness. These scales^ as well as the other scales, 
ihclude aspects of i nstructibnal di rectriess inclading effective plan- 
ning^ and managemeat , iirteraction with many students, subject matter 
knowledge^ and clarity of presentation* 

The ;secoiid factor, which was similar for both .oc_casi_onsJ was most 
xXearly" defined by the two seal es cbncerni.ng behavioral control : J. , 
Pupi 1 Sal f eqntrol ; and M, , 61 assroom Control . The loadi ngs of these 
scales on the factor were greater than .8 on both occasions. This 
factor also included scales with moderate loadings^ i.e., between .40. 
and .60 on B; i CI ari\ty of Assignments and Smoothness of Transitions: H. , 
bemonstration of Personal Regard; 1. Pupil engagement; L. * 01 assroom 
Management; and P., Overall Judgment af Effectiveness. 

^ Discussion 



The correlations describing i^nterrater agreement indicate that 
judges with som^ knowledge of teaching and minimal training can achieve 
moderate to high agreement when observing and rating a given classroom 
session with the TePPR scales. . . 

The interobserver\ajreement' w^ slightly lower on the first obser- 
vation session than on\^ the second, i.e., a median of .68 versus .76. 
The higher agreement for the second occasion mayVesult^ at least in 
part, from the cpmpari sons and communication between the raters that 
f ol 1 owed the f i r st sess^i on > ,^Tlfat i s ,_ they may have i n f 1 uenced one 
and/or others' judgments concerning aspects of j the teachers ' perform- 
ance, and subsequen 1 1 y _ remembered these judgments on the secoind * 
occasion. These communieations may have also jinflated the stability 
coefficients, which we^e also moderate [.5 to Js) for most scales; A 
design which would eliminate such spurious^infjiation of the stability 
coefficients and the second occasion agreement' coefficients would be 
provided by a scheme in which th^^^^ were conducted by dif- 
ferent pai>s of observers dh the two occasions. We recently employed 
this design in a study in /which the teachers were rated on separ^ate 
occasions by different observers; ! 



The factor analytic results reveal a fairly high degree of uni- 
dimensional i ty amona the ratings on the 16 scales. This uhidimen- 
sionaHty may emanate from: two sources. Fi rst , a|flg^ts of teaching 
performance and pupll _ behaviors that refl ect ef feWive instruction 
presumably are integrated and overlapping. ^Cooper (^1981) refers to_ such 
interrelationships as "true halo." Much as the' cognitive ski_l_l s that 
underlie intellectual adaptati on vare manifested i n an intel 1 ectual "g" 
factor, so do mutual ly :related • teaching skills that underlie teaching 
ef fecti veness mani fest themsel ves\ in a "g" factor. Unfortunately , 
aspects of teachi ng performance and pupil behavi ors that refl ect 
teaching effectiveness may also b^e confounded in the minds of the 
observers. Thus, perceptions, inf\erences, and attribution of skill 
levels on some, if not all of the seal es, may have been contaminated by 
lan underlying evaluative dimension,\ i ;e; , which Cooper refers to as 

11 
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"illusory halo effects'* among observer judgmentSi The influence of 
these illusory halo effects, as we:!! as true halo, are both reflected in 
the high common variance pr anidimensiohal i ty among the scales; 

To a large degree, the data preclude discrimination betw^ 
two. sources of unidimenstonaT variance among the scales. A research 
strategy to disentangle the true- hal-o from the illusory halo is needed. 
Presumably, a systematic program. df research to identify sources of such 
attri buti onal errors in percepti oh of teachers sho.ul d i ncl ude both 
coding (l,ow inference) and ratings (high inference measures). 
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Table 1 , 

Means, Standard Devi atfwiSj Inter rater Reliability Coefficients, and Stability Oaefficlents 
for TePPR Ratings of TeaiiHer .Performance ph tWo Occasions 













, Occasion ] 










Occasion 


2 










Groups 

- — — 


i-.h 


k 





-- 




r«a 

iSL 

__ ; 


h 




s 

■ 


- / 


r-i 


r^^c 

ii 




A. 


Organization of 




14 


13 


3.55 


.68 




.75 


.12 


6 


3.16 


.80 


.76 . 


.60 




Classroom 


" 8 


34 


34 


3.81 


.91 




.64 


33 


35 


3.^8 


.76 


- .62 


.61 






C 


21 


21 


4.09 


.68 




.45 


21 


21 


4.49 


.67 


.69 


.63 


B. 


Clarity of Asslghp- 


A 


14 


13 


3.44 


.64 




.52 


n 


6 


3.09 


1.00 ' 


.91 


.15 




liehts* Transitions 


B 


34 


33 


3.70 


.82 




.57 


31 


33 


3.57 


.99 


.83 


.53 






C 


19 


20 


3.95 


.79 




.68 


21 


20 


4.20 


.85 


.51 


.61 


C. 


* Presentation of 


A 


12 


n 


3.42 


.77 




.68 


10 


5 


3.30 


1.24 


.97 


.73 




Subject Matter 




31 


V32 


J. /O 


.93 




.75 


29 


31 


3.66 


1.15 


.77 


.73 






C 


ZO 


21 


3.78 


.73 




.51 


20 


17 


4.05 


.74 


.67 


.39 


0. 


Questioning 


- 

A 


12 . 


-- 
IT 


3.61 


:sQ 




.56 


11 


5 


3.17 


1.10 


.99 


.50 








24 


26 


3.72 


.93 




.92 . 


22 


26 


3.73 


.96 


-vSl— 


— . 75 - 






C 


20 


20 


3.75 


.63 




.56 


18 


17 


3.88 


.80 


.57 


.71 


- 

E. 


Sensitivity to Pupil 


A 


li 


12 


3.36 


.70 




.58 


12 


5 


3.55 


.93 


.87 


.64 




Conpreheiislbn 


■3- 


35 


35 


3.67 


.95 




.50 


32 


34 


3.68 


.91 


.60 


.41 


- 




c 


21 


21 


3.73 


.66 




.10 


20 


19 


4.08 


.62 


20 


53 


F. 


Adaptatloifi to Indi- 


A 


12 


10 


3.59 


.75 




.82 


n 


6 


3.14 


.96 


^-.94'^ 


.50 




vidual 01 fferehces 


B 


33 


35 


3.36 


.83 




.76 . 


28 


31 


3.50 


.98-^ 


.62 


.68 




■ _ _ _ 


c 


18 


20 


3.70 


.72 




.75 


17 


■ 19 


3.73 


. 70 


.81 


.58 


- 

G. 


Quality of Feedback 


a 


14 


13 


3.58 


1.07 




.65 


12 


5 


3.50 


.91 


.81 


.79 




_ B 


3J 


32 


3.31 


.90 




.75 


29 


32 


3.37 


.95 


.68 


.78 






c 




'21 


3.81 


.74 




.65 


20 


21 


4 31 


.65 


.40 


33 


H. 


Degionstrati bh of . 




13 • 




3. S3 


.86 




.36 


11 


_5 


3.56 


1 .07 




<62 






B 




34 








.34 


33 


3S 


3 H7 


1 .01 


.65 


.73 




C 


^' 


21 


3.58 


!74 




.17 


20 


20 


3.97 


!66 


.'63 


!57 


I, 


Plini 1 PhaMamnMit. 


a 


14 • 


13 


_ .__ 
3.96 


— 
.85 




.64 


12 


-6 


3.58 


1 .26 


.80 


.57 






35 


34 


4.16 


.87 




.81 


33 


35 


4.09 


.83 


61 


.63 






C 


21 


21 


4."47 


!64 




.30 


21 


22 


4i53 ^ 


.'63 


:77 


!60 


J. 


Pupil Self Control, 


A 


i4 


13 


_ — 
3.66 


— 
.70 




75 


12 


-6 


3. 33 


1 .08 


.75 


.75 




RBSQQrislbl 11^ 


B 


35 


35 


4.08 


.81 


.87 


33 


35 


4.01 


i86 




39 






C 


21 


21 


4."d2 


.94 


.76 


21 


21 


4.02 


i96 


i85 


.77 


K. 


QMfiQm Q# TiiBjShiir 


a 


14 


12 


- — 
3; 73 


.61 


.38 


12 


5 


3.33 


.96: 


fi3 


.71 




T n ^jk ff*Jl^^ 4 An 
i n MS I cib b 1 uii 


8 


35 


34 


J. 


.97 


.85 


33 


35 


3i71 


.96 


.59 


♦48 






C 


21 


21 


3.76 


!76 

■ 


.67 


20 


20 


4.30 


.71. 


.'l7 


!l5^ 


(- - 

L* 


UlaSSi wHi naJid^^'eil b 

1 


A 






3.25 


.65 . 


.39 




-0 


J.U4 


- mOO 


.00 


. /U 




8 




la 


3.85 


.98 


.83 






J. /fc 


^ no 

1 .UU 


fll 
.OJ 








C 


21 


21 


4!02 


.84 


.42 


21 


21 


4.18 


.89 


.66 


.68 


M. 


Classroom Control 


A 


14 


13 


3.58 


.74 


.50 


12 


6 


3.29 


1.20 


.87 


.72 






8 


35 


35 


4.t4 


.88 


.79 


33 • 


35 


4.01 


.98 


.80 


.49 






C 


20 


21 


4.12 


.84 


.72 


21 


21 


4.26 


1.07 


.90 


.81 


M. 


Quality of Planning 


A 


13 


11 


3.41 


.72 




76 


12 


6 


3.20 


.90 


.74 


.83 




8 


34 


35 


3.59 


.93 




84 


33 


35 


3.35 


1.06 


.79 


.54 






c ■ 


19 


20 


3.72 


.78 




46 


21 


21 


3.88 


.85 


.57 


.19 


0. 


Knowied^e of Subject 


A 


12 


11 


3.35 


.72 


.83 


12 


6 - 


3.^9 


.79 


.73 


.64 




Matter 


8 


'34- 


32 


3.74 


.88 


.86 


,33 


35 


3;61 


.90 


.76 


.69 






C 


20 


21 


3.14 


.63 


.69; 


19 


21 


3.80 


.63 


.53 


.49 


P. 


Overall Teaching 


A 


14 


13 


3.52 


.81 


.78 


11 


6 


3.09 


1.13 . 


.84. 


.56 




Performance 


B 


35 


35 


3.57 


.87 


.68 


35 


35 


3.59 


1.06 


.91 


.72 






C 


18 


19 . 


3.75 


.68 


.70 


20 


18 - 


_ J3.89 ■_ . 


.93 


.86 


.71 



Group A - Beginning teachers_w1tft oni_tpjb« mehths of experience 
8 - Ft vt_ to eight fnooths Pf_ej(peHenca__ 
C - Fourteen to eighteen months of experience ; 

''"A" and "8" refer to arbitrary designations of each (iiwber of the rater pair 

"^Correlation bed«een combined ratings of observer A and 8 on Occasion 1 with combined ratings of A and 8 on Occasion 2 
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Table 2 



Stability edefficiehts for TePPR Ratings of Teacher Performance oh Two Occasions^ 

Experiiehee Level 



^The two occasions were separated by about two to six weeks. 
Correlation between combined ratings of observer A and B on:0ccjision l .with combined 



ratings of A and B on Occasion 2. 



V 

r 







First two 
mon tns 


Second 


Second 

j?ear -^prinq _ 


1 




: . '^12'' 
■ N - 11-12 


N = 26- J5 


r^b 
N = 18-21 


A 


ur ^aii 1 4ia u 1 uii UT wlabbruUiTI 


• bU 


^61 


.63 


R 


uianuy UT MS5 1 y nmen US , iranSiuiOns 




.53 


.61 


r 


r.i cbcn ud u 1 un or oUDjcCt iMatter 


. /o 


.73 


.39 


U • 


III c4*T mn T inn 

yUcb u 1 un 1 ny 


• bu 


.75 


.71 


c 

u • 


Sensitivity to Pupil eomprehensioh 


• 64 


.41 


.53 


r . 


MaapLduion to incnviauai uitterences 


.50 


s. 

.68 


.58 


(1 
u • 


l^Ua n uy OT reeuDaCK 


. 79 


.78 


.33 


H. 


Demonstration of Regard 


.52 


.73 


.67 - 


I. 


Pupil Engagernent 


.67 


.63 


.60 


J. 


Pupil Self Control i Responsibility 


.75 


-.39 


.77 


K. 


Range of Teacher Interaction 


.71 


.48 


.15 


L. 


Classroom* Management 


.70 


.53 


.68 


M. 


Classroom Control 
Quality of 'Planning 


.72 j 


.49^ 


.81 


N. 


: .83 j 


.54 


.19 


G. 


Knowledge of Subject Matter 


.64 1 


.69 


.49 i 


P. 


Overall Teaching Performance 


.56 I 


•72 


.71 i 
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Table 3 



Factor ^Analysis of Ratings, for Two Occasions 





Scale 


Occasion 


1 


Occasion 


2 \ 






Instruction V 


Control 


Instruction 


Control 


A. 


Physical Organization 
of ^J:as'sfooin': ^ 




.77* 


.23 


.61* 


• 16 . 


B. 


Cla^ty of Assignments/ 
fi^ansitions 




.55* 


.57* 


. .73* ■ 


.52* •. 


C. 


Present^:* tion of 7 
Su^'ect Matter. / 




.74* 


.44* : 


.87* 


.28 




E f f e c t ivWrtes s'--'^^ 
Questions 




^ .64* 


.52* : 


.85' 




E. 


Sensitivity to 

Pupil Comprehension 




. 78* 


.24 






F. 


Adaptation to 

Individual Differences 




.78* 


.23 


.60* 


.47* 


G. 


Quality of Feedback 




. 91 * 


.22 


.73*. 


.35 i 

! 


H. 


Demonstration of 
, ^ Personal Regard 




.70* 


.41* 


. 54* 


.45* 

" j 


r. 


Pupil Engagement 
in Tasks 




.60* 


.64* 


.60* 


.45* 


J. 


Pupil Sdlf Control 




.07 


.98* 


.19 


.89* 




Range of Teacher 
Interaction 




.75* 


.12 


.72* 


.25 , 


L. 


Classroom Management 




.71* 


.51* 


.73* 


•.53* 




Classroom Cohtrol 




• 34 


•81* "1 


.27 


.33* 




Quality of Planning 




; .74* 


. 40 * 


.79* 


. 38 ^ 


9; 


Knowledge of Subject Matter 




.68* 


.30 


.91f 


.19 


P. 


Overall Judgment 


1 / 


-.49* 


.76* 


.57* 



*laading = > .40 



i7 



ERIC 



t. TEACHER AND PbPjL PER j=dRMANC^ 

BACkGROUNb : (tePPRl 





DATE or OBSERVATION 


TIME BEGUN 

^ \ 








PROGRAM 

1 — 1 FLald. _ 1 — iCampas 
Based; : Based 


SCHOOL 


DEPARTMENT (IN COLLEGE OP EDUCATIOr^J 

[ 1 Elemenfsry; ( Isecondary; |^ Jspecial 


aRADK LEVRL/aUBJECT 


NAME or OBSERVER 


YEAR OP TEACHINa 

Jnf'rst; n Second; U~lormori . 



SETTING (Describe the classroom setting and circumstances present during observation period) 



1 


»HY 


StCAL DESiON OP CLASSROOM (CHECK ONE OR MORE 

Seli-. 1—1 j — iTearri 
contained; Dpen; | | teaching; 


(other) 

— 1 Resource. j — j Media i— i 
room; K 1 center; 


STAPP PRESENT (SPiEClPY IP MORE THAN ONE) 

1 |Aide(s); | |co-teacher(s); | | Student teacher(sL; j | (observers, parents, etc:) 


^ 


3RG 


lANIZATION OP INSTRUCTION (CHECK ONE- 0R_>«0RBL) 

j — jOoe small group/ 
Whole class; _ 1 1 individual seatwork; 


Small groups; 1 | Individuaii/ed 


INSTRUCTIONAL MODE(s] (check ONE OR more) (OTHCR] 

j — Question j — | ^ LndividiiaJ r— iLearning . — . 

1 I Lecture; ^ • answer; | | Demonstration; ; :s8atwork; | j centers; 1 ;l 


SUBJEClNyiATTER TAUGHT (during OBSERVATION PERIOD. ESTIMATE NUMBER OP MINUTES POR each). 


WJMBER-OF 


MIN. 


SUBJECT 


MIN. 


SUBJECT 


MIN. 


SUBJECT 


MIN. 


SUBJECT 


MIN. 


SUBJECT 


MlN.y 


SUBJECT 


STUDENTS 
PRESENT 




Reading 




Language 
Arts 




Math- 
ematics 




Social 
Studies 




Sciehce 









\ ■ 

Comments (distinctive features of the situation, e.g., minority students, gifted class, handicapped students, unusual case, etc.): 




/ 



PERFORMANCE RATINGS 

The instrument is designed to summarize observations and judgments of a teacher's instructibnai performarice and pupils' behavior In a class- 
room setting. The observation period should spiar apjsrdxlrhately 45 to^6D minutes. The ratings should be based on direct observations of the . 
teacher's and pupils' behaviors during one observation period. Information from previous dbservatibns, other persons' rejaorts o^ the teacher's 
performance, etc., should not Influence the ratings of performance oh this occasion. Do not rate performance on a scale if your observation 
period did hot provide you v^ith ah opportuhiiy to observe the behavior specified oh that scale. 

Basis for judgment. Lessons, activities, and tracher roles vary from one class period to another. YdUr dppdrtUhity to observe certain types 
of teacher or pupil behavior will also vary from one class period to another. The "basis for judgment" ratings alldvv ydu to indicate whether 
you had sufficient or Insufficient opportunities to observe each type of behavior considered. Check "no basis for judgment" if the lesson did 
not present any situations In which this form of performance could be observed and evaluated. Check "substantial basis for judgment" If the 
lesson presented a sufficient liumber of episodes as a basis for judging performance on this dimension, or if the lesson Included situations that . 
prompted or called for observable behavior relevant to this dimension. Indicate "limited" or "moderate" for class periods that provided bases 
between "no basis" and "substantial." 



f Developed by: Edward A. Nelsen 

. ~ Wllliam~d. F?ay 

Catharine C. Knight 

ioos»6-i4 •/■2 0 1901 cbLLBaii or KbucATipN, ARizo^ ... Weston t. Brook 



A. Khysical organization ot classroom and instructional materials; utilization or space — turnishings ettfcientiy arranged, pupils visiDie to teacher 
and vice versa, adec|uacy of space for small group work; posted rules and directions are visible and readable; w ork materials are accessible. 



□ 



1 



□2 



Poo Hy organized, 
poor 'visibility, 
limited accessibility 



- 03 
Adequately organized 



□4 





Well bi^ahized, 
fiaci|itative 



BASIS FOR JUPOMEHJ!_ ' « 

r~1 No basis; Q Limited; 
I I Moderate; | — [Substantial 



ISTRkNOtHS/LiMitAtlONS 



B. Clarity of assignments and smoothness of transitions to lnstructio:^a! activities — preciseness of directions 
class response. ^ 

□-1' - □2 ^ 1 dl _ . Q. ^ - Ds 

Unclear directions, . .. ' Adequate Cleardirectibhs, smooth, efficieiittran 
cbhfusibh, delays buplls respond to directions arid ' 

— — — — • ■ ■• — . : begin assign itiehts promptly 


arid task structure: promptness of 

BASIS FOR JUPOMENT 

1 1 No'basis; 1 |t-imited;: 
1 1 Moderate: | | Substantial 


8T11SNOTH8/1.1M1TAT10NS \ . . - 





C. Skillfulness in presentation of subject matter — clarity, relevance of content, cdmprehensibility of explanatidns, use of examples. 



□1 02 

Vague, confused, stereotypic, 
fragmented, oversimplified, 
boring to pupils 



□3 

Adequate 



□4* . 

Clear, precise, cbmpiete, 
. coherent, logical, \ 
interesting to pupils 



BASIS FOR JUDOMBNt 



Fn No basis; 
I I Moderate; 



□ 

Cimited; 
I Isdbstantlal 



8TREr40tHS/l.iMlt AtlbNS 



D. Effectiveness, frequency, and level of questions.'- variety (e.g., open and closed questions), relevance, clarity of questions; &)^^tent to which questions r^ulre 
student to mentally manipulate information or support an answer with logically measured. evidence.C'hIgh" level or divergen^yersus *1ow 



Vague, narrow, stereotyped, 
unanswerable, or low cognitive questions 



□3 

Occasional, fairly 
effective questions 



STRKNOTHS/LIMITATtONS 



□4 . Ds 

Frequent, clear, varied, 
answerable stimulating, 
' high cognitive questions 



BASILS FOR JUDOMBNT 



I I No basis; | | Limited; 

od e rate; Subs t a n tial 



E. Sensitivity to pupil comprehension — responsiveness to pupil confusion, misunderstanding, boredom, dlstrattim 
02 / 



Insensitive, _ ^ 

unresponsive to confusion 



Ade qu^t e a wa re n es s 
and sensitivity 



□4 



• Sensitive, aware, 
responsive to pupil ' 
□ nderstanding 



BASIS FOR JUDOMBNT 

I I No basis; ■ ^] Urhited; 
I I Moderatg? | [Substantial 



STRENOTHS/LIM ITATIONS 



R. Atiaptatlbn to Individual abilitY differences of pupils — difficulty of assignmerits^^^^ for ability levels of all pupils; adequate 

wait-time; activities are challenging to pupils of different ability levels,' appropriate pacing. 



□ v 

Instruction too difficult 
(o> easy) jor many 
students or too slow 



□2 



□3 



Difficulty level and 
pace usually appro- 
prrate to most students 



□4 



□5 



Highly responsive and 
sensitive 10^ I ability:- 
levels, appropriate p^ce 



BASIS FOiFt JUDOMENT 

r I No basis; I I Limited; 
I I Moderate; | | Substantial 



STRSrictHS/UiMltAtlONS 



G. Quality, of feedback — Indication of correct/incorrect pupil responses, identification and clarification of correct and incorrect elements of 











BASIS FOR JUDOft 


«ENT_ . 


HI 1 :. 


□2 Da 


^ ^4 


□5: 


|— 1 No basis; 


I— 1 Limited: 


D isparagihg, vagu e. 


Adequate 




Iriforrhative, prompt. 




— 1 1 Substantial 


-or entirely lacking 






cleiar, helpful | 


1 — 1 Moderate; — 


8TRSNOTHS/l.tMlTATtONS 













H. Demonstration of personal regard — compiliments when appropriate, provides encouragement, courteous, friendly, enthusiastic. Includes 



\ 



Q_i _ _Q2 

NegatLve^indifferent, vague, 
disparaging 


Mojderateiy 
. effective 


□4 


□5 

Enthusiastic, positive, 
encouraging ' 


BASiS FOR JUDOMENT 

1 1 No basis; 1 iLimitid; 
1 1 Moderate; 1 Isubttantlal 


■ TRKNOTHS/LIM ITATIONS 

b 0 ■— ■■ 











ERIC 



i. Pupil (Rngagertieht in tasks — riesponsiv^ness to tasks, attentiveness, and persistence. (Observe at least three ti mes d u ring the class period t ^ 



□2 



Low student involvement^ 
less than 25% of pupfis engefllod or 
tasks/ac tiuities - - - 



Moderate Involvement, * 

about 50% of pupils engaged 



Q4 lDs. 

High involvement, rnbre than 75% 
of pupils engaged and attentive 
to rask$/?c ti\/i ties most of time 



BASIS POR JUDGMENT 

□ No basis; I l Urnited: 

I 1 Moderate; | |substahtia{ 



STItKNOTHS/LIMITATIONS 



J. Pupil self control, responsibility for behavior pupil compliance with classroom procedures and rules on own voMtion 

□ 3 



Pupils act dlsruptively, . ; 
require continual monitoring 
and discipline ^ 



□ 2 



Majority of pupils cbritrbl selves 
mostbf time buijaveral do not 
comply with procedures 



□ 4 



- □§ 

Pupils maintain order - 
without direct teacher 
intervention 



BASIS irOR JUDGMENT 

I | n6 basis; | | Limited; 

I [Moderate; | [substantial v 



STRENGTHS/LIMITATIONS 



K. Range of teacher interaction — teacher interacts with all pupils, not just a few select Individuals or groups, e.§,, on basis of ability level or 
location in the classroom, sex or ethnicity. . 



Br 



□ 2 



Cbrisistehtiy ignores or 
criticizes certain children, 
narrow action zone 



□ 3 



□ 4 



Ade d uate X o ns id e ra t i 0 n 
and-.distribution of 
attention 



Irnpartially ittteuitive and respon- 
siye to.aLI pupils; action includes 
entire class or group 



BASIS rOR JUDGMENT 

I [ iVIo basis; | [ Limited; 

[ [Moderate; [ [Substantial 



STRENGTHS/LIMITATIONS 



L. Classroom management — appropriate activities, efficient use of time, organization of activities, alternative tasks available for children who 



□ 1 . 02 D3 QS 

No activities for Adequate activities Appropriate 
some children, « and use oftime activities provided; 
poor use of tirnia efficient use of^ime 


BASIS POR JUDGMENT 

[ Ino basis; [ 1 Limited; . 
rn Moderate; 1 -[substantial 


STRENGTHS/LIMITATIONS 




M. dlassroom control — anticipation and control over potentially disruptive situations and behaviors; jonsistent enforcement of rules, orderly 
classroom orocedures. 


□ '1 02 Q3 P5 
Lack of control. Occasional disruptions. Appropriate control and order maintained, 
chaos prevails/ . • but sufficient order to few prbblems, hiiribr problems reisblved 
erratic ehforcerfjcnf of rules conduct instruction without disrupting-dass 


BASIS rOR JUDGMENT 

I -I No basis; | — [Lirhited; 
[ Moderate; 1 jsubstantial 


strengths/limitations • . 



N. Quality of planning for this f esse, /activity — inferred from organization, evidence of goals, clarity of objectives, availability of resources. 













BASIS POR JUDGMENT 






□> 

Adequate planning 


□ 


4 as 


[ |No basis; [ [Limited; 


Poorly planned^ 
fragmented activities, 
la^dkiriq objectives 




. Well piahhed, 
organized^ deai^objectives, 
les$ons maintain interest 


1 [Moderate; | — [Substantial 


STRENGTHS/ LI MltXtlONS 



O. Teacher's knowledge of subject matter — correctness of information, clarity of explanations, relevance of exampJfis, flexibility, elaboration. ^ 

□5 



□ 1 



□ 2 



Deficient in ikWW knowledge, 
teachespniy from manual 



□ 3 
Adequate 



Mastery of jabjeci, 
presents from more_ 
-aso s good e x a mpl es^- 



P re se nt s f rom m.P re_t h arjone viewpoin t. 



BASIS FOR JUDGMENT 

[ [ No basis; ^ [ [ Limited; 
I [Mod e r a t e ; — j — [Substantial 



STRENGTHS/LIMITATIONS 



T 



P. Judgement of overall teadilhg performance during tills observation. 

□ i \ 02 03 

■ Not aasqaate Marginal Adequate 

(Additional comments on following page). 

ERIC 



□ 



TIMB COMPLfTBD 



Excellent, 

well planned, stimulating, 
cohesive session 



Comments: 



Interview: 



ERIC 



Tsachcr-ptipi' Pii|)il 



5: Excoll 



4 Good 



3 Adeciuale 



2 rilarginal 



1 Poor 




Good f^sitgo 



Linsfnisfsctor 



